A Comparison of Text Categorization Methods
نویسنده
چکیده
In this paper firstly I have compared Single Label Text Categorization with Multi Label Text Categorization in detail then I have compared Document Pivoted Categorization with Category Pivoted Categorization in detail. For this purpose I have given the general definition of Text Categorization with its mathematical notation for the purpose of its frugality and cost effectiveness. Then with the help of mathematical notation and set theory ,I have converted the general definitions of Single Label Text Categorization and Multi Label Text Categorization into their respective mathematical representation .Then I discussed Binary Text Categorization as a special case of Single Label Text Categorization. After comparison of Single Label Text Categorization with Multi Label Text Categorization, I found that Single Label Text Categorization or Binary Text Categorization is more general than Multi Label Text Categorization. Thereafter I discussed an algorithm for transformation of Multi Label Classification into Binary Classification and explained the conditions of transformation of Multi Label Classification into Binary Classification. In the second step I compared Document Pivoted Categorization with Category Pivoted Categorization in detail. After comparison we found that Category Pivoted Categorization is more typical and complex than Document Pivoted Categorization. The Category Pivoted Categorization becomes more complicated when new category is added to predefined set of categories and the recurrent classification of documents takes place. Finally I compared Hard Categorization with Ranking Categorization. After comparing them I found that Hard Categorization incorporates ‘Hard Decisions’ about the relevance or belonging of a document to a category. This hard decision is either completely true or completely false. Whereas the Ranking Categorization creates a belonging of a document to a category according to the estimated appropriateness to the document. The final Ranked List is developed in the Ranking Categorization which is used by the human expert for final decision of Text Categorization.
منابع مشابه
Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA
With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...
متن کاملAn Empirical Comparison of Text Categorization Methods
In this paper we present a comprehensive comparison of the performance of a number of text categorization methods in two different data sets. In particular, we evaluate the Vector and Latent Semantic Analysis (LSA) methods, a classifier based on Support Vector Machines (SVM) and the k-Nearest Neighbor variations of the Vector and LSA models. We report the results obtained using the Mean Recipro...
متن کاملFeature Preparation in Text Categorization
Text categorization is an important application of machine learning to the field of document information retrieval. Most machine learning methods treat text documents as a feature vectors. We report text categorization accuracy for different types of features and different types of feature weights. The comparison of these classifiers shows that stemmed or un-stemmed single words as features giv...
متن کاملA Comparison and Semi-Quantitative Analysis of Words and Character-Bigrams as Features in Chinese Text Categorization
Words and character-bigrams are both used as features in Chinese text processing tasks, but no systematic comparison or analysis of their values as features for Chinese text categorization has been reported heretofore. We carry out here a full performance comparison between them by experiments on various document collections (including a manually word-segmented corpus as a golden standard), and...
متن کاملText Classification with Compression Algorithms
This work concerns a comparison of SVM kernel methods in text categorization tasks. In particular I define a kernel function that estimates the similarity between two objects computing by their compressed lengths. In fact, compression algorithms can detect arbitrarily long dependencies within the text strings. Data text vectorization looses information in feature extractions and is highly sensi...
متن کامل